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Abstract This paper studies the H Sobolev seminorm of quadratic functions. The research 
is motivated by the least-norm interpolation that is widely used in derivative-free optimiza- 
tion. We express the seminorm of a quadratic function explicitly in terms of the Hessian 
and the gradient when the underlying domain is a ball. The seminorm gives new insights 
into least-norm interpolation. It clarifies the analytical and geometrical meaning of the ob- 
jective function in least-norm interpolation. We employ the seminorm to study the extended 
symmetric Broyden update proposed by Powell. Numerical results show that the new thoery 
helps improve the performance of the update. Apart from the theoretical results, we propose 
a new method of comparing derivative-free solvers, which is more convincing than merely 
counting the numbers of function evaluations. 
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1 Motivation and Introduction 

Consider an unconstrained derivative-free optimization problem 

minF(x), (1.1) 

xeR" 

where F is a real-valued function whose derivatives are unavailable. Problems of this type 
have numerous applications. For instance, they have been employed to solve the helicopter 
rotor blade design problem [4,3,2], the ground water community problems [27,25], and the 
problems in biomedical imaging [36,37]. 

Many algorithms have been developed for problem (1.1). For example, direct search 
methods (see Wright [53], Powell [40], Lewis, Torczon, and Trosset [33], and Kolda, Lewis, 
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and Torczon [31] for reviews), line search methods without derivatives (see Stewart [48] for 
a quasi-Newton method using finite difference; see Gilmore and Kelley [26], Choi and Kel- 
ley [8], and Kelley [29, 30] for Implicit Filtering method, a hybrid of quasi-Newton and grid- 
based search; see Diniz-Ehrhardt, Martinez, and Raydan [22] for a derivative-free line search 
technique), and model-based methods (for instance, a method by Winfield [52], CQBYLA by 
Powell [39], DFQ by Conn, Scheinberg, and Toint [12,13,14], UOBYQA by Powell [41], a 
wedge trust region method by Marazzi and Nocedal [34], NEWUOA and extensions by Powell 
[44,45,46,47], CONDOR by Vanden Berghen and Bersini [49], BOOSTERS by Oeuvray and 
Bierlaire [38], ORBIT by Wild, Regis, and Shoemaker [51], and MNH by Wild [50]). We 
refer the readers to the book by Brent [6], the one by Kelley [28], and the one by Conn, 
Scheinberg, and Vicente [15] for extensive discussions and the references therein. 

Multivariate interpolation has been acting as a powerful tool in the design of derivative- 
free optimization methods [52,39,16,12,13,14,5,41,34,42,44,20,19,49,17,45,51,50,38, 
46,47, 18]. The following quadratic interpolation plays an important role in Conn and Toint 
[16], Conn, Scheinberg, and Toint [12,13,14], Powell [43,44,45,46,47], and Custodio, 
Rocha, and Vicente [18]: 

min||v2e||2+a||Ve(.,o)|li 

Gee (1.2) 
s.t.Q{x)=f{x), xeS, 

where Q is the linear space of polynomials of degree at most two in n variables, xq is a 
specific point in R", ct is a nonnegative constant, 5 is a finite set of interpolation points in 
R", and / is a function' on S- We henceforth refer to this type of interpolation as least-norm 
interpolation. 

The objective function of problem (1.2) is interesting. It is easy to handle, and practice 
has proved it successful. The main purpose of this paper is to provide a new interpretation of 
it. Our tool is the //' Sobolev seminorm, which is classical in PDE theory but may be rarely 
noticed in nonlinear optimization. We will give new insights into some basic questions about 
the objective function in (1.2). For example, what is the exact analytical and geometrical 
meaning of this objective! Why to combine the Frobenius norm of the Hessian and the 
2-norm of the gradient? What is the meaning of the parameters xq and cr? 

This paper is organized as follows. Section 2 introduces the applications of least-norm 
interpolation in derivative-free optimization. Section 3 presents the main theoretical results. 
We first study the //' seminorm of quadratic functions, and then employ the //' seminorm 
to investigate least-norm interpolation. The questions listed above are answered in Section 
3. Section 4 applies the theory obtained in Section 3 to study the extended symmetric Broy- 
den update proposed by Powell [47]. We show an easy and effective way to improve the 
performance of the update. Section 5 concludes our discussion. 

The main contributions of the paper lie in Section 3 and Section 4. Besides the theoreti- 
cal results, the numerical experience in Section 4 is also a highlight. We show an interesting 
numerical example which suggests that it is inadequate to compare derivative-free solvers 
by simply counting the numbers of function evaluations. To obtain more convincing com- 
parison, we propose a new method of testing derivative-free solvers by introducing statistics 
into the numerical experiments. See Subsection 4.3 for details. 



Notice that / is not always equal to the objective function F, which is the case in Powell [43,44,45,46, 
47]. See Section 2 for details. 
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A remark on notation. We use the notation 



min \j/{x) 

(1.3) 

s.t. mmd)(x) 

xex 



for the bilevel programming problem 

min i//(x) 



s.t. X e [argmin(^(i^)]. ^^■'^^ 

E,ex 



2 Least-Norm Interpolation in Derivative-Free Optimization 

Least-norm interpolation has successful applications in derivative-free optimization, espe- 
cially in model-based algorithms. These algorithms typically follow trust-region methodol- 
ogy [11,42]. On each iteration, a local model of the objective function is constructed, and 
then it is minimized within a trust-region to generate a trial step. The model is usually a 
quadratic polynomial constructed by solving an interpolation problem 

Q{x)=F{x), xeS, (2.1) 

where, as stated in problem (1.1), F is the objective function, and S is an interpolation set. 
To determine a unique quadratic polynomial by problem (2.1), the size of S needs to be 
at least {n+ l)(n-|-2)/2, which is prohibitive when n is big. Thus we need to consider 
underdetermined quadratic interpolation. In that case, a classical strategy to take up the 
remaining freedom is to minimize some functional subject to the interpolation constraints, 
that is to solve 



min S'iQ) 

26 e 

s.t. Q{x)=F{x), xeS, 



(2.2) 



being some functional on Q. Several existing choices of "J lead to least-norm interpola- 
tion, directly or indirectly. Here we give some examples. 

Conn and Toint [16] suggests the quadratic model that solves 

min|lv2e|l2 + |lVe(.^o)|li 

(2.3) 

s.t. Q{x)=F{x), xeS, 

where xo is a specific point in R". Problem (2.3) is a least-norm interpolation problem with 
CT = 1 . It is reported that the resultant algorithm is substantially more robust than a code 
based on LANCELOT [10] subroutines and finite-differences, although no motivation for (2.3) 
is given. 

Conn, Scheinberg, and Toint [12, 13, 14] builds a quadratic model by solving 

min llV^ell^ 

Gee (2.4) 



s.t. Q{x) =F{x), xeS. 
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It is considered as the best general strategy for the DFO algorithm to select a unique model 
from the pool of the possible interpolation models [14]. Wild [50] also works with the model 
defined by (2.4). Problem (2.4) is a least-norm interpolation problem with (7 = 0. Conn, 
Scheinberg, and Vicente [15] explains the motivation for (2.4) by the following error bounds 
of quadratic interpolants. 

Theorem 2.1 [15] Suppose that the interpolation set S = {yo, yi, ■ ■ ■ , ym} (in> n) is con- 
tained in a ball 'S{yo,r) (r > 0), and the matrix 

L= ^{yi -yo ■■■ ym -yo) (2.5) 

has rank n. IfF is continuously differentiable on 'S{yo,r), and VF is Lipscitz continuous on 
(Biyojr) with constant V > 0, then for any quadratic function Q satisfying the interpolation 
constraints (2.1), it holds that 

llVe(^)-VF(x)||2< ^llL'-|l2(v + ||v2e||2)r, •ixe^(yo,r), (2.6) 

and 



|e(x)-FW|< (^^|tL^||2 + -j(v+||V^e||2)r^ 'ixe'B{yo,r). (2.7) 

In light of Theorem 2.1, minimizing some norm of Q will help improve the approximation 
of gradient and function value. Notice that V^Q appears in (2.6-2.7) with 2-norm rather than 
Frobenius norm. 

Powell [43,44,45] introduce the symmetric Broyden update to derivative-free optimiza- 
tion, and it is proposed to solve 

min llV^e-V^gollF 

(2.8) 

s.t. Q{x) =F{x), xeS 

to obtain a model for the current iteration, provided that Qo is the quadratic model used in 
the previous iteration. Let D = Q — Qo and f = F — Qo, and then (2.8) is equivalent to 

min IIV^DIII 

DeQ (2.9) 
s.t. D{x) = f{x), xeS. 

Thus problem (2.8) is essentially a least-norm interpolation problem about Q — Qo- The 
symmetric Broyden update is motivated by least change secant updates [21] in quasi-Newton 
methods. One particularly interesting advantage of the update is that, when F is a quadratic 
function, the solution of problem (2.8) has the property [43,44,45] that 

\\V^Q+-V^F\\l 
= It V^go - - \\W^Q+ - V^QoWl (2.10) 

<i|v2eo-W||2. 

Thus V^Q+ approximates V^F better than V^Qq unless V^g^ = V^gg. Additionally, if (2.8) 
is employed on every iteration, then (2.10) implies that the difference V^Q+ — V^Qo will 
eventually converge to zero, which helps local convergence, in light of the classical theory 
in Broyden, Dennis, and More [7]. 
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Powell [47] proposes the extended symmetric Broyden update by adding a first-order 
term to the objective function of problem (2.8), resulting in 



xo and a being specifically selected parameters, a nonnegative. Again, (2.11) can be in- 
terpreted as a least-norm interpolation about Q — Qq. Powell [47] motivates (2.11) by an 
algebraic example for which the symmetric Broyden update does not behave satisfactorily. 
We will study the extended symmetric Broyden update in Section 4. 

Apart from model-based methods, least-norm interpolation also has applications in di- 
rect search methods. Custodio, Rocha, and Vicente [18] incorporates models defined by (2.4) 
and (2.8) into direct search. The authors attempt to enhance the performance of direct search 
methods by taking search steps based on these models. It is reported that (2.4) works bet- 
ter for their method, and their procedure provides significant improvements to direct search 
methods of directional type. 

For more discussions about least-norm interpolation in derivative-free optimization, we 
refer the readers to Chapter 5 and 1 1 of Conn, Scheinberg, and Vicente [15]. 



3 The // Sobolev Seminorm of Quadratic Functions 

In Sobolev space theory [1,24], the i/' seminorm of a function / over a domain Q is defined 
as 



In this section, we give an explicit formula for the H seminorm of quadratic functions 
when 12 is a ball, and accordingly present a new understanding of least-norm interpolation. 
We prove that least-norm interpolation essentially seeks the quadratic interpolant with min- 
imal //' seminorm over a ball. Theorem 3.1, Theorem 3.2, and Theorem 3.3 are the main 
theoretical results of this paper. 



3.1 The// Seminorm of Quadratic Functions 

The //' seminorm of a quadratic function over a ball can be expressed explicitly in terms of 
its coefficients. We present the formula in the following theorem. 

Theorem 3.1 Let xq be a point in R", r be a positive number, and 



min II V^e- V^golli + CJ|| Ve(xo) - VQo{xo)\\l 



(2.11) 



s.t. Q{x)=F{x), xeS, 




(3.1) 



'S = {xeR": \\x-xo\\2<r}. 



(3.2) 



Then for any Q E Q, 




n + 2 



\\y'Q\\l+\\mxo)\\l 



(3.3) 



where V„ is the vohime of the unit ball in R". 
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Proof Let 
Then 



g = Ve(^o), and G = V^Q. (3.4) 



\G{x-xo)+g\\ldx 



L 
I 

J\\x\\2<l- 

[ {x'G^x + lx'Gg+WgWDdx. 



l-t--foll2<'' 

Gx + gW^dx 



(3.5) 



i 



'\\x\\2<r 

Because of symmetry, the integral of x^Gg is zero. Besides, 

\\gf2dx = Vnn\g\\l (3.6) 

kll2<'- 

Thus we only need to find the integral of x^G^x. Without loss of generality, assume that G is 
diagonal (if not, apply a rotation). Denote the !-th diagonal entry of G by G(„), and the i-th 
coordinate of x by . Then 

lx\\2</^^^^^ /. . I EgL4i I IIGII^ /. . xf,,dx. (3.7) 



"'ll-'-il2<'- Vl=l / "'IWl2<'- 



To finish the proof, we show that 

'' xf..dx= '"' ^ . (3.8) 



I 



V r"+2 



J xj^^dx = V„-ij\'^{i-u^)'^du = V„^iJ^^sm^ecos"0de, (3.9) 



'||x||2<r n + 2 

It suffices to justify (3.8) for the case r = 1 and n > 2. First, 

rl 

xj,^dx = V„-i I 

'M2< 

and, similarly, 

v„ = v„-i / '^cos"ede. (3.10) 

Second, integration by parts shows that 

K K 

/'^cos"+^ede = («+i) r^Wecos" Ode. (3.11) 

Now it is easy to obtain (3.8) from (3.9-3.11). □ 

Theorem 3.1 tells us that the //' seminorm of a quadratic function Q over a ball ® is 
closely related to a combination of HV^QIIf and ||Vg(xo)||2, -^o being the center of 'S, and 
the combination coefficients being determined by the radius of ®. This result is interesting 
for two reasons. First, it enables us to measure the overall magnitude of the gradient over 
a ball. This is nontrivial for a quadratic function, since its gradient is not constant. Second, 
it clarifies the analytical and geometrical meaning of combining the Frobenius norm of the 
Hessian and the 2-norm of the gradient, and enables us to select the combination coefficients 
according to the geometrical meaning. 
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3.2 New Insights into Least-Norm Interpolation 

The //' seminorm provides a new angle of view to understand least-norm interpolation. For 
convenience, we rewrite the least-norm interpolation problem here and call it the problem 
Pi(a): 



min ||v2e||2 + a||Ve(xo)||i 



(Pi(CT)) 



s.t. Q{x) = f{x), xeS. 
We assume that the interpolation system 

Q{x)=f{x), xeS (3.12) 

is consistent on Q, namely 

{QeQ: Q{x) = fix) for every xeSj^Ql. (3.13) 
With the help of Theorem 3.1, we see that the problem Pi [a) is equivalent to the problem 

min |e|Hi(3,) 

(P2('-)) 

s.t.Q{x)=f{x), xeS 

in some sense. The purpose of this subsection is to clarify the equivalence. 
When CT > 0, the equivalence is easy and we state it as follows. 

Theorem 3.2 If O > 0, then the least-norm interpolation problem Pi(ct) is equivalent to 
the problem V2{r) with r = ^ (n -|- 2)/(T. 

It turns out that the least-norm interpolation problem with positive G essentially seeks 
the interpolant with minimal seminorm over a ball. The geometrical meaning oi xq and 
a is clear now: xq is the center of the ball, and ^J{n + 2)/G is the radius. 

Now we consider the least-norm interpolation problem with (7 = 0. This case is par- 
ticularly interesting, because it appears in several practical algorithms [14,44,50]. Since 
Pi (ct) may have more than one solutions when a = 0, we redefine Pi (0) to be the bilevel 
least-norm interpolation problem 

min ||Ve(..o)||2 

s.t. min llV^ellF (Pi(0)) 
Gee 

s.t. Q{x)=f{x), xes. 
The redefinition is reasonable, because we have the following proposition. 

Proposition 3.1 For each CT > 0, the problem Pi (ct) has a unique solution Qa- Moreover, 
Qa converges^ to Qq when CT tends to 0+. 



- We define tlie convergence on Q to be tlie convergence of coefficients. 
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Proof First, we consider the uniqueness of solution when ct > 0. For simplicity, denote 

\Q\a=[P^Q\\l + o\\VQ{x,)\\^^"\ (3.14) 

Let q\ and q2 be solutions of the problem Pi (cj). Then the average of them is also a solution. 
According to the identity 

2m\\a + \q2\a)-\2{q\+q2)\a = -?2|tT, (3.15) 

we conclude that \q\ — q2\a = 0. Thus V^q\ = V^q2 and Vq{xQ) = Vq2{x{)). Now with the 
help of any one of the interpolation constraints qi {x) = f{x) = q2{x), x€ S, we find that the 
quadratic functions q\ and q2 are identical as required. 

When a = 0, we can prove the uniqueness by similar argument, noticing the identity 

^(llvVlli+l|v292||^)-||^v2(^i+^2)|l2 = l|lvV-V2^2||i, (3.16) 

and a similar one for the gradients. 

Now we show that Qa converges to go when ct tends to 0+. Assume that the conver- 
gence does not hold, then we can pick a sequence of positive numbers {ct^} such that {Ok} 
tends to but {2ctj } does not converge to Qq. Notice that 

\\y^Qa,\\F>\\y^Qo\\F, (3.17) 

and 

l|v2e^jli+CT,iive^,,(xo)iiB llv'Goii^+cj,||VGo(^o)iii (3.18) 

It follows that 

||VGa,(xo)||2<|!Veo(xo)||2. (3.19) 

By (3.18-3.19) and the interpolation constraints, {Qai-} is bounded. Thus we can assume 
that {2oj} converges to some Q that is different from Qq. Then inequalities (3.17-3.18) 
implies \\V^Q\\f = llV^gollf, and (3.19) implies ||Ve(xo)|!2 < || Veo(^o)|t2. Hence Q is 
another solution of the problem Pi (0), contradicting to the uniqueness proved above. □ 

Note that Proposition 3.1 implies the problem P2(^) has a unique solution for each pos- 
itive r. Now we can state the relation between the problems Pi (0) and P2('')- 

Theorem 3.3 When r tends to infinity, the solution ofV2(r) converges to the sohition of 
Pi(0). 

Theorem 3.3 indicates that, when ct = 0, the least-norm interpolation problem seeks the 
interpolant with minimal seminorm over W in the sense of limit. Thus Theorem 3.3 is 
the extension of Theorem 3.2 to the case ct = 0. 

The questions listed in Section 1 can be answered now. The meaning of the objective 
function in least-norm interpolation is to minimize the seminorm of the interpolant over 
a ball. The reason to combine the Frobenius norm of the Hessian and the 2-norm of the gra- 
dient is to measure the //' seminorm of the interpolant. The parameters xq and ct determines 
the ball where the seminorm is calculated, xq being the center and ^(n + 2)/CT being 
the radius. When ct = 0, we can interprete least-norm interpolation in the sence of limit. 
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4 On the Extended Symmetric Broyden Update 

In this section, we employ the seminorm to study the extended symmetric Broyden 
update proposed by Powell [47] . As introduced in Section 2, the update defines g+ to be the 
solution of 

min ||v2e-V2eolli + CT||Ve(xo)-Veo(^o)||i 

fieC (4j) 

s.t. Q{x)=F{x), xeS, 

F being the objective function, and Qq being the model used in the previous trust-region it- 
eration. When CT = 0, it is the symmetric Broyden update studied by Powell [43,44,45]. We 
focus on the case CT > 0. In Subsection 4.1, we interpret the update with the //' seminorm. 
In Subsection 4.2, we discuss the choices of xo and CT in the update, and show how to choose 
XQ and CT according to their geometrical meaning. In Subsection 4.3, we test our choices of 
xo and CT through numerical experiments. It is worth mentioning that, the numerical exper- 
iments in Subsection 4.3 are designed in a special way in order to reduce the influence of 
computer rounding errors and obtain more convincing results. 



4.1 Interpret the Update with the i/' Seminorm 
According to Theorem 3.2, problem (4.1) is equivalent to 



min |e-eo|„i(s) 

eeQ (4 2) 



s.t. Q{x) =F{x), xeS, 

where 



■3 = ixe'R" : \\x-xo\\2 < 



v/(« + 2)/ct} . (4.3) 



Thus the extended symmetric Broyden update seeks the closest quadratic model to Qq sub- 
ject to the interpolation constraints, distance being measured by the seminorm \-\h'{'3)- 

Similar to the symmetric Broyden update, the extended symmetric Broyden update en- 
joys a very good approximation property when F is a quadratic function, as stated in Propo- 
sition 4.1. 



Proposition 4.1 IfF is a quadratic function, then the solution 2+ of problem (4.1) satisfies 

~ |G+ -Go|^i(g), 



\Q+-F\lu,s) = |Go-F|^,(^)-|G+-Go|^i(3), (4.4) 



where ® is defined as (4.3). 

Proof Let Q, = 2+ +r(Q+ — F),t being any real number. Then Qt is a quadratic function 
interpolating F on S, as F is quadratic and 2+ interpolates it. The optimality of 2+ implies 
that the function 

<pW = la-QolHi(a) (4-5) 
attains its minimum when t is zero. Expanding (p(r), we obtain 



(p{t)=t^\Q+-F\l, + 2t 1^ [V(2+ - 2o) {x}] '[ViF-Qo){x)\dx+ 1 2+ -Qolli («, • 

(4.6) 
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Hence 

/ [V(e+ - Go) W]^[V(F - go) W] dx = 0. (4.7) 

Then we obtain (4.4) by considering (p ( — 1 ) . □ 

In light of Theorem 3.1, equation (4.4) is equivalent to equation (1.9) of Powell [47]. Notice 
that equation (4.4) implies 

[ \\VQ+{x)-VF{x)\\ldx< [ \\VQoix)-VF{x)\\jdx. (4.8) 

In other words, Vg+ approximates VF on ® better than Vgo unless Vg+ = VQo. 

4.2 Choices of xq and ct 

Now we turn our attention to the choices of xq and a in the update. Recall that the purpose of 
the update is to construct a model for a trust-region subproblem. As in classical trust-region 
methods, the trust region is available before the update is applied, and we suppose it is 

{x:\\x-x\\2<A}, (4.9) 

the point x being the trust-region center, and the positive number A being the trust-region 
radius. 

Powell [47] chooses xq and a by exploiting the Lagrange functions of the interpolation 
problem (4.1). Suppose that5 = {.Vo, yi, ■■ ym}, then the i'-th (i = 0, 1, . . . , m) Lagrange 
function of problem (4.1) is defined to be the solution of 

min|lv2/,.||2 + a||V/,-||2 

''<^2 (4 10) 

s.t. li{yj) = 5ij, ; = 0, 1, m, 

5ij being the Kronecker delta. In the algorithm of Powell [47], S is maintained in a way so 
that Qo interpolates F on 5 except one point, say yo without loss of generality. Then the 
interpolation constraints can be rewritten as 

Q{yj) - Qoiyj) = [F{yo) - Qoiyo)] Soy, j = 0,l,...,m. (4.11) 

Therefore the solution of problem (4.1) is 

Q+ = Qo+[F{yo)-Qoiyo)]lo. (4.12) 

Hence /q plays a central part in the update. Thus Powell [47] chooses xq and a by examining 
/q. Powell shows that, in order to make /q behaves well, the ratio ||xo — i:||2/4 should not 
be much larger than one, therefore xq is set to be x throughout the calculation. The choice 
of CT is a bit complicated. The basic idea is to balance ||V^/o||p and ct|| VZo(xo)||2. Thus a 
is set to T]/(§, where rj and are estimates of the magnitudes of ||V^Zo||f ^"d |[V/o(xo)||2, 
respectively. See Powell [47] for details. 

In Subsection 4.1, we have interpreted the extended symmetric Broyden update with the 
//' seminorm. The new interpretation enables us to choose xq and CT in a geometrical way. 
According to problem (4.2), choosing xq and CT is equivalent to choosing the ball ®. Let us 
think about the role that 2i plays in the update. First, S is the region where the update tries 
to preserve information from Qq, by minimizing the change with respect to the seminorm 
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' Ih' («) ■ Second, ® is the region where the update tends to improve the model, as suggested 
by the facts (4.4) and (4.8) for quadratic objective function. Thus we should choose S to be 
a region where the behavior of the new model is important to us. It may seem adequate to 
pick S to be the trust region (4.9). But we prefer to set it bigger, because the new model 2+ 
will influence its successors via the least change update, and thereby influence subsequent 
trust-region iterations. Thus it is myopic to consider only the current trust region, and a 
more sensible choice is to let S be the ball : ||x — ^"112 < MA}, M being a positive number 
bigger than one. Moreover, we find in practice that it is helpful to require ® D 5. Thus our 
choice of 'B is 

{x:\\x-x\\2<r}, (4.13) 

where 



Consequently, we set 



r = max|M4,max||jc-jc||2| . (4.14) 



n + 2 

XQ=x, and a = — (4.15) 



r 

Therefore our choice of xq coincides with Powell's, but the choice of a is different. 



4.3 Numerical Results 

With the help of the seminorm, we have proposed a new method of choosing a for the 
extended symmetric Broyden update. In this subsection we test the new a through numer- 
ical experiments, and make comparison with the one by Powell [47]. We will also observe 
whether positive CJ in (4.1) brings improvements or not. 

Powell [47] implements a trust-region algorithm based on the extended symmetric Broy- 
den update in Fortran 77 (it is an extended version of the software BOBYQA [46]), and we 
use this code for our experiments. We modify this code to get three solvers as follows for 
comparison. 

a. ) SYMB: a is always set to zero. It is equivalent to using the original symmetric Broyden 

update. 

b. ) ESYMBP: a is chosen according to Powell [47], as described in the second paragraph of 

Subsection 4.2. 

c. ) ESYMBS: a is chosen according to (4.14^.15). We set M = 10 in (4.14). 

In the code of Powell [47], the size of the interpolation set S keeps unchanged throughout 
the computation, and the user can set it to be any integer between n + 2 and (n + 1 ) (n + 2) /2. 
We choose 2n -|- 1 as recommended. The code of Powell [47] is capable of solving problems 
with box constraints. We set the bounds to infinity since we are focusing on unconstrained 
problems. In this case, SYMB can be regarded as another implementation of NEWUOA [44]. 

We assume that evaluating the objective function is the most expensive part for optimiza- 
tion without derivatives. Thus we might compare the performance of different derivative-free 
solvers by simply counting the numbers of function evaluations they use until termination, 
provided that they have found the same minimizer to high accuracy. However, this is in- 
adequate in practice, because, according to our numerical experiences, computer rounding 
errors could substantially influence the number of function evaluations. Here we present an 
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interesting example, which is inspired by the numerical experiments of Powell [47]. Con- 
sider the test problem BDQRTIC [9] with the objective function 

n-4 

F(x)= [(x2 + 2.^2^1 +3x2+2 +4x^+3+5x2)2 -4xi + 3], xeR", (4.16) 

and the starting point e = (1 1 ■•■ 1)^. Let P be a permutation matrix on W, and de- 
fine F(x) = F{Px). In theory, the solver SYMB should perform equally on F and F with 
initial vectors both set to e, as the implementation we are using is mathematically inde- 
pendent of variable ordering. But it is not the case in practice. For n = 20, we applied 
SYMB to F and F with P chosen specifically^^. The solver terminated with function val- 
ues 3.540906874607717D-H01 and 3.540906874698564DH-01, but the numbers of function 
evaluations were 760 and 4267, respectively. To make things more interesting, we applied 
ESYMBS to the same problems. This time the function values were 3.540906874638613D+01 
and 3.540906874609878D-I-01, while the numbers of function evaluations were 4124 and 
833. Notice that this dramatic story was entirely a monodrama of computer rounding errors. 
Similar story also happens to other derivative-free solvers. For instance, we permuted the 
variables in BDQRTIC (h = 20) with the method described above, and then solved the per- 
muted variants of BDQRTIC with the build-in function f minsearch of Matlab 7.6.0.324 
(R2008a), which is a derivative-free solver based on the Nelder-Mead simplex direct search 
[35] described in Lagarias, Reeds, Wright, and Wright [32]. The numbers of function eval- 
uations varied from 9217 to 70105 during the experiments'*. This example might reflect a 
common difficulty for derivative-free solvers. It suggests we pay enough attention to the 
instability cased by computer rounding errors when designing numerical tests for solvers of 
this type. 

We test the solvers SYMB, ESYMBS, and ESYMBP with the following method, in order 
to observe their performance in a relatively reliable way, and to inspect their stability to 
some extent. The basic idea is from the numerical experiments of Powell [47]. Given a 
test problem V with objective function F and starting point x, we randomly generate N 
permutation matrices f^- (/ = 1, 2, . . . , A'), and let 

Fi{x) = F{Pix), Xi = Pr^x, i=l,2, ...,N. (4.17) 

Then we employ the solvers to minimize F; starting form x, . For solver S, we obtain a vector 
#F = {#Fi , #F2, . . . , #Ff^), #Fi being the number of function evaluations required by S for 
Fi and x,. If S fails on Fi and x,, we replace #Fi with K#Fi as a penalty, K being a number 
bigger than one. Then compute the mean value and the standard deviation of vector #F: 

1 

meani#F) = -'^#Fi, (4.18) 

^ /=i 

and 

/ 1 N 

std(#F) = W ^ [#Fi - mean(#F)]2. (4.19) 



^ The matrix P was (eg es en e2o eig eio ei e-j eig ei2 eg ei4 en eg ejs ejs ei i e?), e, being the i-th 
coordinate vector in R^". 

■* The matrix P for 9217 was (ejs e^ ej eig ei eig eg e2o ei3 e^ eg ei eg eyi eio ei4 en en ejg es), and 
the one for 70105 was (e->, ej eij eg e|g eg ejg eyg ei eio ^4 e\\ ey^ e2o en ej ey eg ei4 ei2), e, being the i-th 
coordinate vector in R^". 
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When is reasonably large, mean(#F) estimates the average performance of solver S on 
problem V, and std(#F) reflects the stability of solver S on problem V. We may also com- 
pute the relative standard deviation of#F, namely, 

rstd(#F) = ^^, (4.20) 
mean(#F) 

to obtain a normalized measure of stability. Besides, min(#F) and max(#F) are meaningful. 
They approximate the best-case and worst-case performance of solver S on problem V. By 
comparing these statistics, we can reasonably assess the solvers under consideration. We use 
A' = 10 in practice. 

With the method addressed above, we tested SYMB, ESYMBP, and ESYMBS on BDQRTIC 
and six test problems used by Powell [44], namely, ARWHEAD, CHROSEN, PENALTY 1, 
PENALTY2, SPHRPTS, and VARDIM. The starting points and the initial/final trust-region 
radius were set according to Powell [42,44]. See Powell [42,44] for details of these test 
problems. For each problem, we solved it for dimensions 10, 12, 14, 16, 18, and 20. The 
experiments were carried out on a Dell PC with Linux, and the compiler was gfortran in 
GCC 4.3.3. 

All the solvers terminated successfully for every problem with every dimension, except 
the problem CHROSEN. The objective function of CHROSEN is the "chained Rosenbrock" 
function 

n-l 

^W = E [4(-«'---*li)' + (l-^'+i)']' (4.21) 
1=1 

and the staring point is i = — (1 1 • • • 1)^. Its global minimizer ise = (l 1 ••• 1), which 
is difficult to find. When a solver found a local minimizer different from e, we called it a 
failure and penalized ^ the number of function evaluations by multiplying 3. In Table 4.1, the 
number corresponding to a dimension n and a solver S presents how many times S failed 
when solving the ten permuted variants of «-dimensional CHROSEN. All the solvers failed 
for several times. 



Table 4. 1 : Number of Failures on CHROSEN 



n 


10 


12 


14 


16 


18 


20 


SYMB 


3 


1 


1 


2 


4 


4 


ESYMBP 





3 


3 





2 


4 


ESYMBS 





1 


1 


1 





4 



Table 4.2 reports the values of mean(#F) generated in our experiments. For each prob- 
lem and each dimension, we present the statistics in the order SYMB (the first line), ESYMBP 
(the second line), and ESYMBS (the third line). The values have been rounded to the nearest 
integer. Let us first focus on the comparison between ESYMBP and ESYMBS. Table 4.2 sug- 
gests that ESYMBS worked better than ESYMBP in the sense of average performance. Thus 
our choice of o improved the performance of the extended symmetric Broyden update on 
the test problems. It convinces us that the seminorm does bring a better interpretation 

Althougli the solvers under consideration are not designed for global optimization, we still think the pe- 
nalization is necessary, because when different minimizers are found, the corresponding numbers of function 
evaluations are not comparable. 
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of the update. Compared with SYMB, both ESYMBP and ESYMBS usually worked worse in the 
sense of average performance. Thus both choices of positive o rarely reduced the number 
of function evaluations in our experiments. However, we should notice that the positive a 
did not do much damage, even though very large o was allowed (for example, the a in 
ESYMBS could be as large as lO'^ in our experiments). 



Table 4.2: mean(#F) of SYMB, ESYMBP, and ESYMBS 





10 


12 


14 


16 


18 


20 




186 


239 


338 


426 


435 


784 


ARWHEAD 


274 


431 


393 


461 


552 


806 






328 


417 


tzj 




1 




1017 


1087 


1748 


1381 


1266 


1794 


BDQRTIC 


1399 


1929 


2406 


2804 


2967 


3221 




916 


1659 


2143 


2692 


2414 


3187 




713 


679 


819 


1113 


1753 


1861 


CHROSEN 


515 


1193 


1340 


925 


1471 


2165 




455 


646 


862 


962 


1001 


2013 




2900 


3599 


4360 


5637 


6265 


7752 


PENALTY 1 


4419 


4698 


5886 


6012 


7689 


8507 




3019 


3877 


4458 


5255 


6471 


7188 




588 


1656 


1018 


1515 


1476 


1479 


PENALTY2 


1299 


1609 


2516 


2320 


3065 


4006 




1213 


1457 


2365 


2325 


3427 


3644 




250 


397 


3285 


1831 


1598 


2708 


SPHRPTS 


492 


607 


3503 


1997 


1999 


2382 




256 


436 


2730 


1100 


1552 


3173 




1550 


2399 


3371 


3944 


4615 


5873 


VARDIM 


1997 


2828 


3611 


4394 


5716 


6327 




1392 


2316 


3208 


4064 


5230 


6219 



We visualize the comparison made above by presenting the performance profile [23] 
of mean(#f") in Fig. 4.1. The same as traditional performance profiles, the higher means 
the better. The performance profile supports the conclusions made above. Additionally, we 
give the performance profiles of min(#F) and max(#F) in Fig. 4.2 and Fig. 4.3, respec- 
tively. We notice that ESYMBP and ESYMBS hold better positions in Fig. 4.2 than in Fig. 4.1. 
Since mm{#F) reflects the best-case performance of solvers, we infer that, if all the solvers 
could perform their best, then both ESYMBP and ESYMBS would become more competitive. 
It suggests that these two solvers still have the potential to improve. 

Table 4.3 shows the values of std(#F) and rstd(#F) generated in our experiments. We 
present the statistics in the same order as in Table 4.2. For each solver and each problem, 
the values of std(#F) and rstd(#F) come together in the form "std(#F) /rstd(#F)". The 
values of std(#F) are rounded to the nearest integer, and only three digits of rstd(#F) are 
shown. It is clear that all the three solvers often suffered from computer rounding errors, 
except when solving ARWHEAD. Notice that the big values of std(#F) and rstd(#F) for 
CHROSEN are partially due to the penalization on failures (see Table 4.1). With the help 
of the performance profiles of std(#F) and rstd(#F) presented in Fig. 4.4 and Fig. 4.5, we 
conclude that ESYMBS performed relatively stabler than its competitors on the test problems. 
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Fig. 4.1: Performance Profile of mean(#f") 
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Fig. 4.2: Performance Profile of min {#F) 



5 Conclusions 



Least-norm interpolation has wide applications in derivative-free optimization. Motivated 
by its objective function, we have studied the //' Sobolev seminorm of quadratic functions. 
We gives an explicit formula for the //' seminorm of quadratic functions over balls. It turns 
out that least-norm interpolation essentially seeks a quadratic interpolant with minimal //' 
seminorm over a ball. Moreover, we find that the parameters xq and a in the objective 
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Fig. 4.3: Performance Profile of max 
Table 4.3: std(#F) and rstd(#F) of SYMB, ESYMBP, and ESYMBS 



n 10 12 14 16 18 20 





4/0.02 


7/0.03 


9/0.03 


24/0.06 


36/0.08 


24/0.03 


ARWHEAD 


18/0.07 


27/0.06 


36/0.09 


32/0.07 


46/0.08 


58/0.07 




8/0.03 


17/0.05 


19/0.05 


25/0.06 


25/0.05 


41/0.05 




384/0.38 


676/0.62 


713/0.41 


897/0.65 


762/0.60 


1355/0.76 


BDQRTIC 


257/0.18 


182/0.09 


602/0.25 


728/0.26 


1158/0.39 


1230/0.38 




485/0.53 


388/0.23 


143/0.07 


680/0.25 


1127/0.47 


1191/0.37 




458/0.64 


444/0.65 


474/0.58 


710/0.64 


1048/0.60 


1102/0.59 


CHROSEN 


54/0.11 


775/0.65 


844/0.63 


73/0.08 


884/0.60 


1206/0.56 




40/0.09 


359/0.56 


426/0.49 


476/0.50 


74/0.07 


1228/0.61 




259/0.09 


258/0.07 


344/0.08 


489/0.09 


338/0.05 


467/0.06 


PENALTYl 


349/0.08 


547/0.12 


709/0.12 


557/0.09 


496/0.06 


893/0.10 




282/0.09 


257/0.07 


374/0.08 


419/0.08 


735/0.11 


377/0.05 




326/0.55 


101/0.06 


697/0.68 


1006/0.66 


1059/0.72 


1014/0.69 


PENALTY2 


454/0.35 


426/0.26 


240/0.10 


1286/0.55 


701/0.23 


396/0.10 




119/0.10 


516/0.35 


211/0.09 


849/0.37 


199/0.06 


246/0.07 




14/0.06 


56/0.14 


410/0.12 


354/0.19 


122/0.08 


1230/0.45 


SPHRPTS 


155/0.31 


237/0.39 


712/0.20 


245/0.12 


630/0.32 


954/0.40 




17/0.07 


96/0.22 


461/0.17 


74/0.07 


175/0.11 


212/0.07 




225/0.14 


270/0.11 


316/0.09 


478/0.12 


481/0.10 


447/0.08 


VARDIM 


265/0.13 


409/0.14 


652/0.18 


543/0.12 


520/0.09 


623/0.10 




156/0.11 


262/0.11 


335/0.10 


389/0.10 


336/0.06 


768/0.12 
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Fig. 4.4: Performance Profile of std(#F) 
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Fig. 4.5: Performance Profile of rstd(#F) 



function determines the center and radius of the ball, respectively. These insights further our 
understanding of least-norm interpolation. 

With the help of the new observation, we have studied the extended symmetric Broy- 
den update (4.1) proposed by Powell [47]. Since the update calculates the change to the 
old model by a least-norm interpolation, we can interpret it with our new theory. We have 
discussed how to choose xq and a for the update according to their geometrical meaning. 
This discussion leads to the same xq used by Powell [47], and a very easy way to choose a. 



18 



Zaikun Zhang 



According to our numerical results, the new a works relatively better than the one proposed 
by Powell [47]. 

It is interesting to ask whether the extended symmetric Broyden update can bring im- 
provements to the original version (2.8) in derivative-free optimization. Until now, we can 
not give a positive answer. However, the numerical results of Powell [47] suggest that the 
extended version helps to improve the convergence rate. It would be desirable to theoret- 
ically investigate the local convergence properties of the updates. Results like (2.10) and 
(4.4) may help in such research. We think it is still too early to draw any conclusion on the 
extended symmetric Broyden update, unless we could find strong theoretical evidence. 

The numerical example on BDQRTIC deserves serious attention. It shows clearly that 
we should be very careful when comparing derivative-free solvers via numerical experi- 
ments. Our method of designing numerical tests seems reasonable. It reduces the influence 
of computer rounding errors, and meanwhile reflects the stability of solvers. We hope this 
method can lead to more reliable benchmarking of derivative-free solvers. 
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